[Draft][DAGCombiner] Preserve debug location of original load in fold (conv (load x)) #160236

jwu10003 · 2025-09-23T05:43:31Z

[DAGCombiner] Preserve debug location of original load in fold (conv (load x))

This patch fixes a debug information loss issue during the combine of a conversion (e.g., bitcast) with a load into a new load: fold (conv (load x)) -> (load (conv*)x).

The newly created load node was incorrectly using the debug location (SDLoc) of the conversion operation (the conv node, N) instead of the location of the original load operation (the load node, LN0). The location of the conversion operation often points to compiler-internal instructions and provides little value for source-level debugging. In contrast, the original load's location accurately represents the source of the data access in the user's code.

This change ensures the new load inherits the debug location from LN0 by using SDLoc(LN0), which improves debugging experience and fixes a test case failure observed in the Triton compiler.

github-actions · 2025-09-23T05:43:49Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2025-09-23T05:44:20Z

@llvm/pr-subscribers-debuginfo
@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-amdgpu

Author: None (jwu10003)

Changes

[DAGCombiner] Preserve debug location of original load in fold (conv (load x))

This patch fixes a debug information loss issue during the combine of a conversion (e.g., bitcast) with a load into a new load: fold (conv (load x)) -> (load (conv*)x).

The newly created load node was incorrectly using the debug location (SDLoc) of the conversion operation (the conv node, N) instead of the location of the original load operation (the load node, LN0). The location of the conversion operation often points to compiler-internal instructions and provides little value for source-level debugging. In contrast, the original load's location accurately represents the source of the data access in the user's code.

This change ensures the new load inherits the debug location from LN0 by using SDLoc(LN0), which improves debugging experience and fixes a test case failure observed in the Triton compiler.

Full diff: https://github.com/llvm/llvm-project/pull/160236.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+1-1)
(added) llvm/test/CodeGen/AMDGPU/combine-conv-load.ll (+41)

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index a6ba6e518899f..4cb0a35aa7b25 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -16703,7 +16703,7 @@ SDValue DAGCombiner::visitBITCAST(SDNode *N) {
         }
       }
       SDValue Load =
-          DAG.getLoad(VT, SDLoc(N), LN0->getChain(), LN0->getBasePtr(),
+          DAG.getLoad(VT, SDLoc(LN0), LN0->getChain(), LN0->getBasePtr(),
                       LN0->getMemOperand());
       DAG.ReplaceAllUsesOfValueWith(N0.getValue(1), Load.getValue(1));
       return Load;
diff --git a/llvm/test/CodeGen/AMDGPU/combine-conv-load.ll b/llvm/test/CodeGen/AMDGPU/combine-conv-load.ll
new file mode 100644
index 0000000000000..900c973b712ae
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/combine-conv-load.ll
@@ -0,0 +1,41 @@
+; RUN: llc -mtriple=amdgcn -mcpu=gfx942  < %s | FileCheck %s
+
+; CHECK-LABEL:  test:
+; CHECK:        .loc    1 8 16                          ; test.py:8:16
+; CHECK-NEXT:   s_load_dword
+
+; Function Attrs: alwaysinline mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite)
+define amdgpu_kernel void @test(ptr addrspace(1) inreg readonly captures(none) %0, ptr addrspace(1) inreg writeonly captures(none) %1, ptr addrspace(1) inreg readnone captures(none) %2, ptr addrspace(1) inreg readnone captures(none) %3) local_unnamed_addr #0 !dbg !4 {
+  %5 = tail call i32 @llvm.amdgcn.workitem.id.x(), !dbg !7
+  %6 = and i32 %5, 255, !dbg !7
+  %7 = icmp eq i32 %6, 0, !dbg !7
+  br i1 %7, label %8, label %10, !dbg !7
+
+8:                                                ; preds = %4
+  %9 = load <1 x float>, ptr addrspace(1) %0, align 4, !dbg !8, !amdgpu.noclobber !6
+  store <1 x float> %9, ptr addrspace(1) %1, align 4, !dbg !7
+  br label %10, !dbg !7
+
+10:                                               ; preds = %8, %4
+  ret void, !dbg !9
+}
+
+; Function Attrs: alwaysinline nocallback nofree nosync nounwind speculatable willreturn memory(none)
+declare noundef range(i32 0, 1024) i32 @llvm.amdgcn.workitem.id.x() #1
+
+attributes #0 = { alwaysinline mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite) "amdgpu-agpr-alloc"="0" "amdgpu-flat-work-group-size"="1,256" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="1,1" "denormal-fp-math-f32"="ieee" "uniform-work-group-size"="false" }
+attributes #1 = { alwaysinline nocallback nofree nosync nounwind speculatable willreturn memory(none) }
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!2, !3}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer: "triton", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
+!1 = !DIFile(filename: "test.py", directory: "/path")
+!2 = !{i32 2, !"Debug Info Version", i32 3}
+!3 = !{i32 1, !"amdhsa_code_object_version", i32 500}
+!4 = distinct !DISubprogram(name: "test", linkageName: "test", scope: !1, file: !1, line: 7, type: !5, scopeLine: 7, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0)
+!5 = !DISubroutineType(cc: DW_CC_normal, types: !6)
+!6 = !{}
+!7 = !DILocation(line: 9, column: 20, scope: !4)
+!8 = !DILocation(line: 8, column: 16, scope: !4)
+!9 = !DILocation(line: 9, column: 4, scope: !4)

arsenm · 2025-09-23T05:46:27Z

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

@@ -16703,7 +16703,7 @@ SDValue DAGCombiner::visitBITCAST(SDNode *N) {
        }
      }
      SDValue Load =
-          DAG.getLoad(VT, SDLoc(N), LN0->getChain(), LN0->getBasePtr(),
+          DAG.getLoad(VT, SDLoc(LN0), LN0->getChain(), LN0->getBasePtr(),


Would getMergedLocation work better?

Would getMergedLocation work better?

I did some experiments:
LN0 loc = test.py:8:16
N loc = test.py:9:20
=>
getMergedLocation(LN0, N) = test.py:0

Is this as expected?

The code is as follows. If there are any mistakes, please point them out to me.

However, the test needs to check test.py:8:16.

arsenm · 2025-09-23T05:46:33Z